Goto

Collaborating Authors

 context pattern


HopCast: Calibration of Autoregressive Dynamics Models

Shahid, Muhammad Bilal, Fleming, Cody

arXiv.org Artificial Intelligence

Deep learning models are often trained to approximate dynamical systems that can be modeled using differential equations. These models are optimized to predict one step ahead and produce calibrated predictions if the predictive model can quantify uncertainty, such as deep ensembles. At inference time, multi-step predictions are generated via autoregression, which needs a sound uncertainty propagation method (e.g., Trajectory Sampling) to produce calibrated multi-step predictions. This paper introduces an approach named HopCast that uses the Modern Hopfield Network (MHN) to learn the residuals of a deterministic model that approximates the dynamical system. The MHN predicts the density of residuals based on a context vector at any timestep during autoregression. This approach produces calibrated multi-step predictions without uncertainty propagation and turns a deterministic model into a calibrated probabilistic model. This work is also the first to benchmark existing uncertainty propagation methods based on calibration errors with deep ensembles for multi-step predictions.


FlashBack:Efficient Retrieval-Augmented Language Modeling for Long Context Inference

Liu, Runheng, Xiao, Xingchen, Huang, Heyan, Chi, Zewen, Wu, Zhijing

arXiv.org Artificial Intelligence

Retrieval-Augmented Language Modeling (RALM) by integrating large language models (LLM) with relevant documents from an external corpus is a proven method for enabling the LLM to generate information beyond the scope of its pre-training corpus. Previous work utilizing retrieved content by simply prepending it to the input poses a high runtime issue, which degrades the inference efficiency of the LLMs because they fail to use the Key-Value (KV) cache efficiently. In this paper, we propose FlashBack, a modular RALM designed to improve the inference efficiency of RALM with appending context pattern while maintaining decent performance after fine-tuning by Low-Rank Adaption. FlashBack appends retrieved documents at the end of the context for efficiently utilizing the KV cache instead of prepending them. And we introduce Marking Token as two special prompt tokens for marking the boundary of the appending context during fine-tuning. Our experiments on testing generation quality show that FlashBack can remain decent generation quality in perplexity. And the inference speed of FlashBack is up to $4\times$ faster than the prepending counterpart on a 7B LLM (Llama 2) in the runtime test. Via bypassing unnecessary re-computation, it demonstrates an advancement by achieving significantly faster inference speed, and this heightened efficiency will substantially reduce inferential cost.


In-Context Exemplars as Clues to Retrieving from Large Associative Memory

Zhao, Jiachen

arXiv.org Artificial Intelligence

In recent years, large language models (LLMs) have garnered significant attention due to their ability to revolutionize natural language processing (NLP) by demonstrating impressive language understanding and reasoning capabilities (7; 6; 45; 56; 44). LLMs are first pretrained on extensive data using the language modeling technique where the model predicts the next token given a context. Without finetuning on task-specific data, LLMs leverage in-context learning (ICL), also referred to as few-shot prompting, to make predictions. Through ICL, LLMs can find underlying patterns of the input query through given in-context exemplars, such as a set of input/output pairs, and use them to complete the response. However, the effects of in-context exemplars on downstream performance via ICL and guidelines for formulating those exemplars (e.g., how to select exemplars and how many exemplars to use) remain unclear.


Automatic Context Pattern Generation for Entity Set Expansion

Li, Yinghui, Huang, Shulin, Zhang, Xinwei, Zhou, Qingyu, Li, Yangning, Liu, Ruiyang, Cao, Yunbo, Zheng, Hai-Tao, Shen, Ying

arXiv.org Artificial Intelligence

Entity Set Expansion (ESE) is a valuable task that aims to find entities of the target semantic class described by given seed entities. Various Natural Language Processing (NLP) and Information Retrieval (IR) downstream applications have benefited from ESE due to its ability to discover knowledge. Although existing corpus-based ESE methods have achieved great progress, they still rely on corpora with high-quality entity information annotated, because most of them need to obtain the context patterns through the position of the entity in a sentence. Therefore, the quality of the given corpora and their entity annotation has become the bottleneck that limits the performance of such methods. To overcome this dilemma and make the ESE models free from the dependence on entity annotation, our work aims to explore a new ESE paradigm, namely corpus-independent ESE. Specifically, we devise a context pattern generation module that utilizes autoregressive language models (e.g., GPT-2) to automatically generate high-quality context patterns for entities. In addition, we propose the GAPA, a novel ESE framework that leverages the aforementioned GenerAted PAtterns to expand target entities. Extensive experiments and detailed analyses on three widely used datasets demonstrate the effectiveness of our method. All the codes of our experiments are available at https://github.com/geekjuruo/GAPA.


CAPoW: Context-Aware AI-Assisted Proof of Work based DDoS Defense

Chakraborty, Trisha, Mitra, Shaswata, Mittal, Sudip

arXiv.org Artificial Intelligence

Critical servers can be secured against distributed denial of service (DDoS) attacks using proof of work (PoW) systems assisted by an Artificial Intelligence (AI) that learns contextual network request patterns. In this work, we introduce CAPoW, a context-aware anti-DDoS framework that injects latency adaptively during communication by utilizing context-aware PoW puzzles. In CAPoW, a security professional can define relevant request context attributes which can be learned by the AI system. These contextual attributes can include information about the user request, such as IP address, time, flow-level information, etc., and are utilized to generate a contextual score for incoming requests that influence the hardness of a PoW puzzle. These puzzles need to be solved by a user before the server begins to process their request. Solving puzzles slow down the volume of incoming adversarial requests. Additionally, the framework compels the adversary to incur a cost per request, hence making it expensive for an adversary to prolong a DDoS attack. We include the theoretical foundations of the CAPoW framework along with a description of its implementation and evaluation.


Semantic Lexicon Induction from Twitter with Pattern Relatedness and Flexible Term Length

Qadir, Ashequl ( University of Utah ) | Mendes, Pablo N. (IBM Research) | Gruhl, Daniel (IBM Research) | Lewis, Neal (IBM Research)

AAAI Conferences

With the rise of social media, learning from informal text has become increasingly important. We present a novel semantic lexicon induction approach that is able to learn new vocabulary from social media. Our method is robust to the idiosyncrasies of informal and open-domain text corpora. Unlike previous work, it does not impose restrictions on the lexical features of candidate terms — e.g. by restricting entries to nouns or noun phrases —while still being able to accurately learn multiword phrases of variable length. Starting with a few seed terms for a semantic category, our method first explores the context around seed terms in a corpus, and identifies context patterns that are relevant to the category. These patterns are used to extract candidate terms — i.e. multiword segments that are further analyzed to ensure meaningful term boundary segmentation. We show that our approach is able to learn high quality semantic lexicons from informally written social media text of Twitter, and can achieve accuracy as high as 92% in the top 100 learned category members.


Word Segmentation for Chinese Novels

Qiu, Likun (Singapore University of Technology and Design) | Zhang, Yue (Singapore University of Technology and Design)

AAAI Conferences

Word segmentation is a necessary first step for automatic syntactic analysis of Chinese text. Chinese segmentation is highly accurate on news data, but the accuracies drop significantly on other domains, such as science and literature. For scientific domains, a significant portion of out-of-vocabulary words are domain-specific terms, and therefore lexicons can be used to improve segmentation significantly. For the literature domain, however, there is not a fixed set of domain terms. For example, each novel can contain a specific set of person, organization and location names. We investigate a method for automatically mining common noun entities for each novel using information extraction techniques, and use the resulting entities to improve a state-of-the-art segmentation model for the novel. In particular, we design a novel double-propagation algorithm that mines noun entities together with common contextual patterns, and use them as plugin features to a model trained on the source domain. An advantage of our method is that no retraining for the segmentation model is needed for each novel, and hence it can be applied efficiently given the huge number of novels on the web.


Web-Scale N-gram Models for Lexical Disambiguation

Bergsma, Shane (University of Alberta) | Lin, Dekang (Google, Inc.) | Goebel, Randy (University of Alberta)

AAAI Conferences

Web-scale data has been used in a diverse range of language research. Most of this research has used web counts for only short, fixed spans of context. We present a unified view of using web counts for lexical disambiguation. Unlike previous approaches, our supervised and unsupervised systems combine information from multiple and overlapping segments of context. On the tasks of preposition selection and context-sensitive spelling correction, the supervised system reduces disambiguation error by 20-24% over the current state-of-the-art.